CNN에서의 FC vs average pooling

2017 3 6일 월요일

오후 3:31

FC Layer in  CNN

 

Fully Connected Layer CNN에서 어떤 역할을 할까?

 

- CONV, ReLU, Pooling을 거쳐서 나오는 노드들은 이전 입력의 일부분만을 반영하게 된다. = Local Connectivity

- Fully Connected Layer는 이전 계층의 모든 노드들에 대해 연결되어 있다.

- , 입력 이미지의 (부분이 아닌) 전체에 대해 연결된 뉴런들이다.

- (영어로는) Contains neurons that connect to the entire input volume, as in ordinary Neural Networks

 

시스템 생성 대체 텍스트:
Fully Connected Layer (FC layer) 
Contains neurons that connect to the entire input volume, as in ordinary Neural 
Networks 
POOL 
POOL 
POOL 
RELU RELU 
RELU RELU 
RELU RELU 
CONV CONV CONV CONV CONV CONV 
1 
들출~-- 되 
FC 
plane 
orse

 

GoogLeNet, Global average pooling

 

GoogLeNet FC 대신 global average pooling을 사용하였다.

 

FC

- Neuron 하나하나가 전체 이미지 입력과 연결되어 있다. 이것이 FC가 가지는 의미

- 그런데 parameter가 엄청나게 많다. 전체 CNN model parameter의 대부분을 차지한다.

- 심지어 이런 parameter를 학습까지 시켜야 한다. (backpropagation)

- 이는 overfitting을 일으키기 쉽다

 

global average pooling

 

- 하나의 필터가 생성해낸 하나의 activation map 전체를 average pooling 해버린다.

- 예를 들어 13*13*2048이 들어왔다면,

1) 이는 하나의 필터가 전체 이미지를 흝어 13*13*1 을 출력한다는 것이고

2) 이러한 필터가 2048개였다는 말이다.

3) 이를 global average pooling 해주면 13*13의 값을 하나의 scala 값으로 만들어주어, 전체적으로 1*1*2048 결과를 출력해준다.

- 따라서, 13*13*1 activation map 하나는 전체 이미지를 표현해주며, 그것을 average pooling 한 하나의 값도 전체 이미지를 표현한다

- 최종적으로 전체 이미지에 대한 표현력을 가진 2048개의 노드가 출력된다. 이것을 softmax에 넣어주면 끝

- global average pooling을 하게 되면

1) parameter 0개가 된다. 따라서 학습할 것도 없다.

2) average 해주는 것, pooling 해주는 것 둘 다 overfitting에 대한 대응이 된다.

3) 또한 FC가 블랙박스라면 global average pooling은 각각의 node가 각각의 feature를 상징하게 된다

 

average pooling vs max pooling

 

링크: http://wiki.fast.ai/index.php/Lesson_7_Notes

 

- Average pooling: 전체 그림이 얼마나 고양이스럽니?

- Max pooling: 전체 그림중에서 가장 고양이스러운 부분은 어디니?

 

Global Average Pooling 
Recall from earlier our simple model that used a Global Average Pooling layer. This layer works similar to max 
pooling, except that instead of replacing entire areas with the maximum value, replaces it with the average. In 
our example, the output of the Resnet blocks is 13x13 with 2048. The way we would implement global average 
pooling is to take the average of value of all values across the entire 13x13 matrix (hence the term global), and 
do that for each filter. So the output of global average pooling on the afore mentioned matrix would be 2048x1. 
The reason why using global average pooling and one dense layer was more successful than a deeper fully 
connected network is because Resnet was trained with this layer in it, and therefore the filters it creates were 
designed to be averaged together. Global average pooling also means we don't necessarily have to use dropout, 
because we have a lot less parameters in our dense layers. This of course helps in preventing over-fitting, and 
overall these layers help make models that are very generalizable. 
One way to intuit th difference between average and max pooling is in how it treats the downsampled "images" 
we're left with after the convolutional layers. In classi6ying cats vs. dogs, averaqinq over the image tells us 'how 
doaav or cattv is this imaae overall." Since a large part of these images are all dogs and cats, this would make 
sense. If you were using max pooling, you are simply findinq "the most doaav or catty" part of the imaae. which 
probably isn't as useful. However, this may be useful in something like the fisheries competition, where the fish 
occupy only a small part of the picture.

 

 

 

 

참고 링크

 

1) GoogLeNet 슬라이드

링크: GoogLeNet Insights

 

GoogLeNet Insight #5 
End with Global Average Pooling Layer Instead of Fully Connected Layer 
Label 
[øäläliälälälälälilälöää'äl 
Global 
Pooling 
Fully-Connected layers are prone to over-fitting 
— Hampers generalization 
Linear 
Layer for SoftMax 
adapting to 
otter label 
Average Pooling has no parameter to optimize, thus no over-fitting. 
Averaging more native to the convolutional structure 
— Natural correspondence between feature-maps and categories leading 
to easier interpretation 
Average Pooling does not exclude the use of Dropouts, a proven 
regularization method to avoid over-fitting.

 

2) 논문 Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

링크: https://arxiv.org/pdf/1602.07261.pdf

 

Inception-v4에서의 Average Pooling

- 8*8*1536을 입력받아 1536을 출력

Inception-ResNet-v1 에서의 Average Pooling

- 8*8*1792를 입력받아 1792를 출력

Softmax 
Dropout (keep 0.8) 
Avarage Pooling 
3 x Inception-C 
Output: 
Output: 1530 
Output: 15" 
output: b,8x1536

 

Dropout (Кеер 0.8) 
Average PooIing 
5 х lnception-resnet-C 
сх,1ди:

 

 

3) 논문 Network in Network

링크: https://arxiv.org/pdf/1312.4400.pdf

3.2 Global Average Pooling 
Conventional convolutional neural networks perform convolution in the lower layers of the network. 
For classification, the feature maps of the last convolutional layer are vectorized and fed into fully 
connected layers followed by a softmax logistic regression layer [41 [81 [Il]]. This structure bridges 
the convolutional structure with traditional neural network classifiers. It treats the convolutional 
layers as feature extractors, and the resulting feature is classified in a traditional way. 
However,Jihe fully connected layers are prone to overfittin thus hampering the generalization abil- 
ity of the overall network. Dropout is proposed by Hinton et al. [51] as a regularizer which randomly 
sets half of the activations to the fully connected layers to zero during training. It has improved the 
generalization ability and largely prevents overfitting 
In this paper, e propose another strategy called global average poolin to replace the traditional 
fully connected layers in CNN. The idea is to generate one feature map for each corresponding 
category of the classification task in the last mlpconv layer. Instead of adding fully connected layers 
on top of the feature maps, we take the average of each feature map, and the resulting vector is fed 
directly into the softmax layer. One advantage of global average pooling over the fully connected 
layers is that it is more native to the convolution structure by enforcing correspondences between 
feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence 
maps. Another advantage is that there is no parameter to optimize in the global average pooling 
thus overfitting is avoided at this layer. Futhermoæ, global average pooling sums out the spatial 
information, thusjt is more robusttospatial translations of the inpy!' 
We can see global average pooling as a structural regularizer that explicitly enforces feature maps to 
be confidence maps of concepts (categories). This is made possible by the mlpconv layers, as they 
makes better approximation to the confidence maps than GLMs.

 

 

 

 

 

 

 

 

 

 

 

Microsoft OneNote 2016에서 작성